NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Attention-Only Transformers via Unrolled Subspace Denoising

Wang, Peng; Lu, Yifu; Yu, Yaodong; Pai, Druv; Qu, Qing; Ma, Yi (July 2025, International Conference on Machine Learning 2025)

Full Text Available
Attention-Only Transformers via Unrolled Subspace Denoising

Wang, Peng; Lu, Yifu; Yu, Yaodong; Pai, Druv; Qu, Qing; Ma, Yi (May 2025, International Conference on Machine Learning)

Full Text Available
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Wu, Ziyang; Ding, Tianjiao; Lu, Yifu; Pai, Druv; Zhang, Jingyuan; Wang, Weida; Yu, Yaodong; Ma, Yi; Haeffele, Benjamin D (April 2025, International Conference on Learning Representations)

Full Text Available
Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

Chen, Siyi; Zhang, Huijie; Guo, Minzhe; Lu, Yifu; Wang, Peng; Qu, Qing (December 2024, Advances in Neural Information Processing Systems)

Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. We provide a solid theoretical basis to justify the linearity and low-rankness in the PMP. These insights allow us to propose an unsupervised, single-step, training-free LOw-rank COntrollable image editing (LOCO Edit) method for precise local editing in diffusion models. LOCO Edit identified editing directions with nice properties: homogeneity, transferability, composability, and linearity. These properties of LOCO Edit benefit greatly from the low-dimensional semantic subspace. Our method can further be extended to unsupervised or text-supervised editing in various text-to-image diffusion models (T-LOCO Edit). Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit.
more » « less
Full Text Available
Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

Chen, Siyi; Zhang, Huijie; Guo, Minzhe; Lu, Yifu; Wang, Peng; Qu, Qing (December 2024, Advances in Neural Information Processing Systems)

Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. We provide a solid theoretical basis to justify the linearity and low-rankness in the PMP. These insights allow us to propose an unsupervised, single-step, training-free LOw-rank COntrollable image editing (LOCO Edit) method for precise local editing in diffusion models. LOCO Edit identified editing directions with nice properties: homogeneity, transferability, composability, and linearity. These properties of LOCO Edit benefit greatly from the low-dimensional semantic subspace. Our method can further be extended to unsupervised or text-supervised editing in various text-to-image diffusion models (T-LOCO Edit). Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit.
more » « less
Full Text Available
Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing

Chen, Siyi; Zhang, Huijie; Guo, Minzhe; Lu, Yifu; Wang, Peng; Qu, Qing (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing

Chen, Siyi; Zhang, Huijie; Guo, Minzhe; Lu, Yifu; Wang, Peng; Qu, Qing (November 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. We provide a solid theoretical basis to justify the linearity and low-rankness in the PMP. These insights allow us to propose an unsupervised, single-step, training-free LOw-rank COntrollable image editing (LOCO Edit) method for precise local editing in diffusion models. LOCO Edit identified editing directions with nice properties: homogeneity, transferability, composability, and linearity. These properties of LOCO Edit benefit greatly from the low-dimensional semantic subspace. Our method can further be extended to unsupervised or text-supervised editing in various text-to-image diffusion models (T-LOCO Edit). Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit. The code and the arXiv version can be found on the project website.
more » « less
Full Text Available
The Emergence of Reproducibility and Consistency in Diffusion Models

Zhang, Huijie; Zhou, Jinfan; Lu, Yifu; Guo, Minzhe; Shen, Liyue; Qu, Qing (June 2024, International Conference on Machine Learning)

Full Text Available
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture

Zhang, Huijie; Lu, Yifu; Alkhouri, Ismail; Ravishankar, Saiprasad; Song, Dogyoon; Qu, Qing (June 2024, Conference on Computer Vision and Pattern Recognition)

Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.
more » « less
Full Text Available
Dissecting Distribution Inference

https://doi.org/10.1109/SaTML54575.2023.00019

Suri, Anshuman; Lu, Yifu; Chen, Yanjin; Evans, David (February 2023, IEEE Conference on Secure and Trustworthy Machine Learning (SaTML))

Full Text Available

Search for: All records